from matplotlib import rc
rc('animation', html='jshtml')
The COVID-19 pandemic has caused significant disruptions to daily life around the world since it was first identified in Wuhan, China, on December 2019. The pandemic has had an impact not only on people's daily lives but also on economies, education systems, and healthcare systems worldwide, leading to widespread lockdowns, travel restrictions, and vaccination efforts. The virus has spread rapidly, affecting millions of people and causing hundreds of thousands of deaths. More than 200 million positive cases and 4 million deaths were reported globally. So, it is essential for us to understand how the virus spreads and impacts different regions and countries.
We analyze COVID-19 spread using the data which Governments across the globe and World Health Organization(WHO) maintain.
# Importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime # for datetime calculations
# import plotly
import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio
import requests
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings('ignore')
pio.renderers.default='notebook'
from plotly.offline import init_notebook_mode
init_notebook_mode(connected=True)
Let us take the dataset from WHO which contains the data about the cumulative number of cases and deaths along with new deaths and cases for each day starting from January 3rd 2020 to April 26th 2023, that is the data of around 1210 days.
df = pd.read_csv('data/WHO-COVID-19-global-data.csv')
# Convert str to date object
df['Date_reported'] = pd.to_datetime(df['Date_reported'])
df.head()
| Date_reported | Country_code | Country | WHO_region | New_cases | Cumulative_cases | New_deaths | Cumulative_deaths | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2020-01-03 | AF | Afghanistan | EMRO | 0 | 0 | 0 | 0 |
| 1 | 2020-01-04 | AF | Afghanistan | EMRO | 0 | 0 | 0 | 0 |
| 2 | 2020-01-05 | AF | Afghanistan | EMRO | 0 | 0 | 0 | 0 |
| 3 | 2020-01-06 | AF | Afghanistan | EMRO | 0 | 0 | 0 | 0 |
| 4 | 2020-01-07 | AF | Afghanistan | EMRO | 0 | 0 | 0 | 0 |
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 286770 entries, 0 to 286769 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date_reported 286770 non-null datetime64[ns] 1 Country_code 285560 non-null object 2 Country 286770 non-null object 3 WHO_region 286770 non-null object 4 New_cases 286770 non-null int64 5 Cumulative_cases 286770 non-null int64 6 New_deaths 286770 non-null int64 7 Cumulative_deaths 286770 non-null int64 dtypes: datetime64[ns](1), int64(4), object(3) memory usage: 17.5+ MB
# Number of unique countries
print(f"Number of countries: {df.Country.unique().shape[0]}")
Number of countries: 237
# time range
print("External Data")
print(f"Earliest Entry: {df['Date_reported'].min().date()}")
print(f"Last Entry: {df['Date_reported'].max().date()}")
# Get the number of days between the first and last entry
print(
f"Total Days: {(df['Date_reported'].max() - df['Date_reported'].min()).days + 1} days")
External Data Earliest Entry: 2020-01-03 Last Entry: 2023-04-26 Total Days: 1210 days
cumulative_cases = df["Cumulative_cases"].groupby(df['Country']).max()
# plot bar chart of cumulative cases of few countries
cumulative_cases.nlargest(40).plot(kind='bar', figsize=(
12, 6), title="Cumulative Cases of Top 40 Countries")
plt.ylabel("Cumulative Cases")
plt.show()
The bar plot above shows cumulative cases of the top 40 countries. We can see that the United States, China, India, France, Germany, Brazil, Japan, Korea, Italy, UK topped the list. It is surprising that except for India and Brazil remaining, all others are developed countries. Despite being forward in both economic and technological terms, it seems that these developed countries couldn't control the pandemic outbreak.
# plot bar chart of cumulative deaths of top 40 countries
cumulative_deaths = df["Cumulative_deaths"].groupby(df['Country']).max()
cumulative_deaths.nlargest(40).plot(kind='bar', figsize=(
12, 6), title="Cumulative Deaths of Top 40 Countries")
plt.ylabel("Cumulative Deaths")
plt.show()
The deaths, however, show a different trend. It is surprising to see that China which had 100 million cases, has only 0.2 million deaths, while Mexico, which had around 10 million cases, has 0.4 million deaths. Countries like Mexico, Peru, Indonesia, South Africa, and Romania crawled up the list to occupy top positions in cumulative deaths, which indicates a high death ratio(deaths/cases) in those countries. If we observe most of these are developing countries, lack of awareness and unavailability of proper health systems could be the main reason for such a high death ratio. This can be more justified by looking at the death ratio plot below.
# Plot the death ratios of top 40 countries as per cumulative cases
death_ratio = cumulative_deaths / cumulative_cases
death_ratio.nlargest(40).plot(kind='bar', figsize=(
12, 6), title="Death Ratio of Top 40 Countries")
plt.ylabel("Death Ratio")
plt.show()
From the above plot, we can see that not even a single developed country like China, Japan, Korea, Australia, and France is in the top 40 in terms of death ratio. Also, we can observe that even India is not among the top 40 in terms of death ratio, which explains how we fought against the virus through health facilities and proper awareness. All the countries above are either developing or underdeveloped.
data = df[df["Country"] == "India"]
# plot daily cases in India
fig = px.bar(data, x='Date_reported', y='New_cases',
title="Daily Cases in India", labels={"Date_reported":"Date", "New_cases":"cases"})
fig.show()
# plot daily deaths in India
fig = px.bar(data, x='Date_reported', y='New_deaths',
title="Daily deaths in India", labels={'Date_reported': "Date", "New_deaths": 'New deaths'})
fig.show()
Now, let us have a look at the number of new deaths and new cases per day in India over the span of three years. The above plots indicate three significant outbreaks of COVID-19 in India, one each in 2020, 2021, 2022. It is interesting to see that each time when people thought that everything was under control, a new wave began and ruined their hopes. From the plots, it is evident that the outbreak in 2021 is more severe than the one in 2020; however, the most impacting one(2021) didn't last long compared to the less impacting one(2020). If we remember, these are the two time zones where we had our lockdowns. We can also see that there is a peak in the number of new cases in Januray-February 2022, but it isn't that significant in terms of the number of deaths per day.
The plot below shows the cumulative deaths of all the countries worldwide. From the plot, we can see that from January 2020-December 2021, the number of deaths is increasing more in the USA, South American countries, India and Russia. It is interesting to see that the number of deaths in China isn't that much in the whole span of 2 years when compared to the rest of the world; also, from the plots, we can see that except for South Africa, none of the African countries seems to have been much affected from the pandemic.
df = pd.read_csv('data/WHO-COVID-19-global-data.csv')
data = df[df['Date_reported'] >= '2020-01-01']
# plot the cases heatmap on the world map with date slider
fig = px.choropleth(data, locations="Country", locationmode='country names',
color="Cumulative_deaths", hover_name="Country",
animation_frame="Date_reported", title='Cumulative deaths from 2020-01-01',
color_continuous_scale=px.colors.sequential.Plasma, range_color=(min(data.Cumulative_deaths), max(data.Cumulative_deaths)))
fig.show()
The plot below shows the cumulative cases of all the countries worldwide. From the plot we can see that from January 2020-Decenmber 2021, the number of cases is increasing more in the USA, South American countries, India, and Russia. Interestingly, the number of cases in China isn't that much in the whole span of 2 years compared to the rest of the world; however, the cases seem to have increased in the last five months(December 2022-April 2023). Also, from the plots, we can see that, except for South Africa, none of the African countries seems to have been affected by the pandemic in this scenario too.
# plot the cases heatmap on the world map with date slider
fig = px.choropleth(data, locations="Country", locationmode='country names',
color="Cumulative_cases", hover_name="Country",
animation_frame="Date_reported", title='Cumulative cases from 2020-01-01',
color_continuous_scale=px.colors.sequential.Plasma, range_color=(min(data.Cumulative_cases), max(data.Cumulative_cases)))
fig.show()